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i.  Summary, 

One  of  the  most  important  aspects  of  the  analysis  of 
dai-a  by  regression  methods  is  the  examination  of  residuals. 

This  implies  the  careful  inspection  of  the  differences, 

a 

»  Yi  -  Y^,,  i  =  1,  2,  . ..,  n,  between  the  observed  values 
y^,  and  the  corresponding  values  y^  which  are  predicted  by 
the  fitted  model  at  the  n  observation  sites.  There  are 
many  ways  of  looking  at  residuals;  see,  for  example,  Draper 
and  Smith  (1966,  Chapter  3)  and  Wooding  (1969).  Important 
basic  techniques  are  those  of  plotting  the  residuals  against 
their  corresponding  fitted  values,  or  against  the  corresponding 
values  of  the  independent  variables,  or  against  the  correspond¬ 
ing  values  of  "new"  variables,  and  (in  all  cases)  observing 
the  pattern  thus  formed. 

Draper  and  Smith  say  that  an  "ideal"  pattern  for  most 
plots,  which  implies  no  denial  of  the  regression  assumptions, 
occurs  when  the  residuals  form  a  "horizontal  band,”  This  ’’s 
always  true  for  the  so  called  "fixed  effect"  analysis  of 
variance  models.  In  fitting  models  with  continuous  variables,  it 
is  ‘usually  true  within  the  practical  limitations  of  most  plots, 
but  is  not  precisely  true  theoretically  because  the  residuals 
are  not  independent,  nor  do  they  all  have  the  same  variance. 

The  purpose  of  this  note  is  to  point  out  that  there  is  likely 
to  be  at  least  a  slight  pattern  of  changing  magnitude  of  the 
residuals  in  such  plots  and  that,  if  such  an  effect  is  at  all 
pronounced  (as  it  may  well  be,  given  certain  properties  of  the 
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desiqn  matrix)  then  the  variance-covariance  structure  of 
the  residuals  should  be  taken  into  account  in  the  analysis* 

2.  Introduction  and  Discussion 


Suppose  the  model 

2  •  XB  +  £ 

is  fitted  by  least  squares  where  y  is  an  n*l  vector  of  obser¬ 
vations,  X  an  n*q  matrix  of  known  constants ;  8  a  q*l  vector 
of  unknown  parameters;  and  e  is  an  n*l  vector  of  randomly  dis¬ 
tributed  errors.  We  make  the  usual  assumptions  that  E(e)  =  0 
and  V ( e )  =  Io“.  The  least  squares  estimate  of  B  is  given  by 
b  =  (X'xT1  X*  y,  and  the  vector  of  residuals  is 

e  =  y  -  y  =  v 
=  (I  -  R)  y 

where  R  -  X(X'X)''1  X’.  Thus 
the  same  linear  transformation  of  the  known  observations  y  or 
the  unknown  errors  c.  It  also  follows  that 

E<e)  =  0  (3) 

and 

V(e)  *  (I  -  R) o2 


-  Xb 


(1) 


-  it  -  R)  e  (2) 


the  residuals  can  be  regarded  as 


(4) 
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when  the  variant  assumptions  of  the  model  are  correct. 

*  2 

Since  V(y)  =  Rc  ,  the  variances  of  the  n  individual  residuals 
are  given  by 


V{6.)  *  -  V(yj)  ,  j  =  1,  2,  ....  n  (S) 


-  <l  - 


where  r ^ .  is  the  left  diagonal  element  of  R.  The  pattern  of 

the  variances  of  the  residuals  is  therefore  the  complement  of 

* 

that  of  the  predicted  values  y.  It  is  evident  that 


0  <{1  "  r.. .)  <  1 

J 


since  v{e,«)  is  non-negative  and  r .  .  is  a  positive  definite 
3  J  J 

quadratic  form.  Vie.)  is  sero  only  when  e4  =  0  independent 

J  3 

of  y,  such  as  in  saturated  designs  (when  n=q)  or  when  the 
peculiarities  of  the  design  force  y^  to  equal  y ^  exactly. 

For  example  the  residual  at  the  center  point  of  certain 
second  order  three-level  designs  (see  Box  and  Behnken,  1960) 
with  one  center  point/  will  always  be  zero,  and  V(e)  -  0  there. 


2.1.  Residuals  for  a  first  order  model  with  a  constant  term. 


In  fitting  a  straight  line  model  n  =  SQ  +  Bx  we  can 
recall  (Draper  &  Smith,  1966,  p.23)  that  V(y) increases  as 
the  distance  of  x  from  the  mean  value  x  of  the  observed 
x*s,  increases.  Figure  la  shows  a  typical  band  of  95% 


confidence  intervals  for  the  expected  values  of  V?  derived 

from  five  equally  spaced  observations.  As  a  consequence, 

residuals  at  x-sites  closer  to  x  will  have  larger  variance 

than  residuals  further  away.  Figure  lb  shows  the  pattern  of 

the  standard  deviation  of  the  residuals  for  5  equally 

spaced  observation.  Note  that  we  do  not  extend  the  "balloon 

pattern"  outside  the  range  of  x  for  which  we  have  observations 

and,  actually,  the  pattern  really  consists  only  of  individual 

verticals  intervals  at  the  points  at  which  we  have  observations. 

These  interyals  are  drawn  in  Figure  lb  to  be  of  width 

2o  IV(e^) }  (twice  the  numbers  written  on  the  ordinates  of 

the  observation  sites)  and  the  end-points  are  joined  by  a 

smooth  curve  simply  to  show  the  variation  more  clearly.  The 

ratio  between  the  center  residual  standard  deviations  and  that 

1/2 

of  an  outside  point  is  2  7  ~  1.4.  Such  variation  in  standard 

deviation  would  usually  not  be  discernible  in  a  typical 
residuals  plot. 

The  severity  of  this  "ballooning"  of  V(e^)  depends  on 
the  actual  values  of  the  x'‘s  used  in  the  regression  and  may 
or  may  not  be  important  in  a  practical  problem.  If  the 
variances  of  the  residuals  varied  a  great  deal,  it  would  be 
worthwhile  to  examine  tl.e  e,./{i  -  instead  of  the  e^, 

-i  U  ** 

in  the  usual  residuals  plots,  and  to  use  the  more  correct 
1/2 

ej/(s(l  -  rjj)5  f  instead  of  &ys,  as  the  "normal  deviate 
form"  of  the  residuals.  In  many  cases,  as  we  shall  illustrate 
via  examples,  this  refinement  is  not  needed,  but  in  some  cases 
it  may  be  helpful  to  avoid  possible  misinterpretations. 


The  ballooning  of  the  residual  variance  at  the  center 
of  gravity  of  the  data  viil  occur  in  general  whenever  first 
order  models  with  constant  terms  are  fitted,  i.e.  whenever 
we  fit 


E(y)  *  B  +  l  Bj  x.. 
°  i*l  A  1 


We  can  assume  without  loss  of  generality  that  the  x^s  are 

n 

coded  so  that  x,  «  £  x,./n  *  0  for  i*l,  2,  k. 

x  j«l  13 

Suppose  we  write  X  =  (1,  D)  where  D  is  the  usual  design 
matrix.  Then  R  «  ll'/n  +  D (D'D) -1  d* ,  and  it  follows  that 


V(y3)  *  ll/n  +  tj:j]  a2 


where  t. .  is  a  positive  definite  quadratic  form.  Therefore 


Vtej)  ®  t(n-l)/n  -  tjj]  o2  (10) 


ai*d  tjj  *  0  only  if  the  jth  row  of  D  is  at  the  centroid  where 
xij  =  ®  ^or  a^~  v(e.j)  must  increase  monotonically  away 
from  this  minimum  since  it  is  a  true  quadratic  in  the  k 
independent  variables  of  the  first  order  model.  Hence  for 
these  models 


0  <  V(ej)  <  C  {n-l)/n)  ]  o' 
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2.2.  Residuals  for  first  order  models  without  constant  terms. 


When  a  first  order  model  without  a  8^  term  is  fitted, 

o 

the  ballooning  pattern  is  replaced  by  a  decreasing  V(e} 

as  the  distance  from  the  actual  origin  increases.  Equations 

2 

(6)  and  (7)  hold,  with  the  maximum  V(e),  of  value  o  ,  being 
achieved  only  at  the  origin.  Typically  no  observations  would 
be  taken  at  the  origin  in  fitting  this  model,  since  they  do 
not  enter  into  the  values  of  the  estimates  at  all.  If  they 
were  taken  (e.g.  to  assist  in  checking  the  assumption  that 
$  »  0) ,  it  is  clear  that  relatively  larger  residuals  should 

A 

be  expected  there  since  must  always  be  exactly  zero,  no 
matter  where  the  corresponding  y^'s  lie. 

2.3.  Residuals  for  models  that  are  not  first  crder. 


When  some  of  the  x^'s  in  a  regression  model  are  functions 
of  other  x^’s,  as,  for  example,  in  the  second  order  model. 


y 


k 

6  +  f 

°  i«l 


k 

6i  *i  +  8ij  xixu 


+ 


(12) 


equations  (6)  and  (7)  hold  but  no  general  statements  can  be 
made  about  the  location  of  maxima  for  V{e)  or  their  number. 

When  a  constant  term  is  included, the  maximum  would  hypothetically 
occur  at  the  point  in  the  k  dimensional  factor  space  corres¬ 
ponding  to  the  average  of  each  of  the  q  colunais  of  the  X 
matrix.  But  such  a  point  may  not,  in  fact,  exist.  For  example 


in  a  second  order  model,  when  the  x^'s  are  at  their  average 
2 

value  the  x^s  are  not.  The  largest  residual  variance  in  a 
second  order  design  may  not  occur  at  the  center  point  if 
the  number  of  replicates  there  5s  small  enough  (see  section 
3). 

2.4.  Average  value  of  the  variances  of  the  residuals. 

Since  the  average  value  of  the  variance  of  the  predicted 
values  at  the  observation  points  is  given  by 

V(y)  *  n"1  l  V (y .)  *  n"1  trUKS1*)"1^®2}  «  qc2/n.  (13) 
j=l  J 

where  X  is  an  nxq  matrix,  the  average  variance  of  the 
residuals  is 

V(e)  -  n”1  l  V(e .)  =  (n-q)o2/n.  (14) 

3*1  J 

Even  if  the  residual  variance  is  reasonably  constant,  it 
might  still  be  useful  to  consider  the  magnitude  of  e^  relative 
to  s/{ (n-q) /n}*^2  rather  than  just  s  (the  estimate  of  a)  in 
cases  where  n  is  not  large  relative  to  q. 

3.  Examples. 

3.1.  Straight  line  in  one  variable . 


In  Exercise  A,  page  35  of  Draper  and  Smith  (1966) ,  11 


Ui^K»^«!-»'  I JIU  i  IU  M 
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observations  of  a  response  Y  occur  at  11  equally  spaced 

values  of  a  predictor  variable  X,  at  X  =  -5,  -4,  . ..,  4, 

5,  The  model  y=8  +  8x  +  e  is  fitted  and  the  variances  of 

o 

p 

the  residuals  can  be  shown  to  be  o’  times 

0.68,  0.76,  0.83,  0.87,  0.90,  0.91,  0.90,  0.37,  0.83,  0.76,  0.68 

Thus,  the  central  residual  has  variance  1.3  times  as 

great  as  fct-s  extreme  residuals.  (For  standard  errors,  the 

factor  is  thus  1.2.)  From  a  practical  point  of  view 

such  variation  would  not  be  discernible  in  an  actual  residuals 

plot.  The  worst  variation  in  the  residual  standard  error 

one  could  obtain  for  the  same  range  with  11  points  results  from 

the  design  with  9  points  at  the  center  and  one  each  at  ^5. 

The  equivalent  ratios  here  are  2.2  and  1.5,  and  a  correction 
might  be  worthwhile  in  such  a  case, 

3.2.  Straight  line  through  the  origin. 

Suppose  the  model  v=3x  +  e  is  fitted  to  observations 

taken  at  x=l ,  2,  3,  4,  5.  The  variances  of  the  residuals 
2 

are  o  times 

.98,  .93,  .83,  .78,  .55 

respectively.  The  steady  increase  in  variance  with  a 
factor  of  1.8  (1.3  in  standard  deviation)  between  lowest  and 
highest  might  be  marginally  detectable  in  a  residuals  plot 
and  might  lead  to  misleading  conclusions  unless  the  possible 
danger  was  realized. 
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3.3,  First  order  rotatable  designs. 

A  common  design  used  for  fitting  a  first  order  model 
is  a  two  level  factorial  or  fractional  factorial  (2k-p)  with 
one  or  more  center  points.  Since  the  design  is  orthogonal 
and  hence  first  order  rofcataLle,  there  are  but  two  kinds  of 
sites  as  far  as  the  Variance  of  a  residual  is  concerned.  The 
variance  of  b  is  clearly  where  n«n  +  2k”p  and  n  is 
the  number  of  replicated  center  points.  Equation  (14)  can 
then  be  used  as  an  easy  means  of  calculating  V(e^) ,  the  var- 


iance  of  the  residuals 

at  the  2k  p  factorial  points. 

We  find 

s'V1  - 

+  2k~p  V(ef ) }  *  n“iJk+15- 

(15) 

or 

V(ef )  = 

{ (n-1) (n-nQ) -nk }/{n  2k~p} . 

(16) 

Thu?  *.ne  ratio  of  the  center  point  residual  variance  to  that 
at  the  factorial  points  is 

V(ec.p.}  s:  2k~p(n>l)  . 

"vTij)  v'n-i)  (n-nQ}-nk  * 

When  a  23  design  with  two  center  points  is  used,  for  example, 
this  ratio  is  1.7.  The  standard  deviation  is  therefore  1.3 
times  as  large  for  the  center  point  residual  as  for  the 
factorial  points  and  this  is  of  marginal  importance. 

The  simplex  design,  which  is  a  k+1  point  orthogonal 
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design  An  k  variables,  provides  perhaps  the  worst  example 
for  a  symmetrical  design.  By  the  same  procedure  as  used  above 
it  is  easy  to  show  that 


(k+1) <k+nQ) 


(18) 


Here  when  k=*3  and  n  =2  the  ratio  is  10  and  one  should 

o 

not  be  surprised  by  very  large  residuals  at  the  center  points. 
(This  is  an  extreme  example,  however,  because  the  residuals 
would  also  be  highly  correlated  since  the  residual  mean  square 
has  only  2  d.f. ) 


3.4.  Second  order  designs. 

3.4.1.  Central  composite  designs. 

In  fitting  a  second  order  model  (equation  12)  commonly 
used  experimental  designs  are  the  central  composite  designs 
of  Box  &  Hunter  (1957) .  For  a  k  -  factor  design  these  consist 
of  the  following  points. 


or  factorial  points 


2k  axial  points  (19) 

nQ  center  points 


If  the  value  of  o  is  chosen  as  v(y)  is  a 

function  only  of  the  distance  of  the  point  in  the  factor 
space  from  the  center  of  the  design.  Designs  with  this 
property  are  called  rotatable  designs  and  lead  to,  at  most, 
three  different  V(e^) ,  one  for  all  the  factorial  points,  one 
for  the  axial  points  and  one  for  the  center  points.  In 
some  cases  the  axial  points  and  factorial  points  lie  on  the 
same  hypershere  and  have  the  same  V(e^) .  For  the  three-  and 
four-factor  designs  the  following  results  are  obtained. 


o'2  V{e_.) 

Factors  (k)  No.  of  Center  Points  Factorial.  "  AxTaT  Center 


- 1 - 

.  3$ 

.39 

TOl 

3 

2 

.33 

.39 

.50 

3 

.33 

.39 

.67 

6 

.33 

.39 

,83 

4 

1 

.42 

.42 

0 

2 

.42 

.42 

.50 

3 

.42 

.42 

.67 

7 

-42 

.42 

.86 

The 

recommended  number? of  center  points 

are  6  (for 

k*3)  and 

7  (for  k=4) .  Only  with  these  higher  numbers  of  replicates 
do  the  center  point  residual  variances  become  large  relative 
to  the  others i  the  ratios  of  the  standard  deviation  of  a 
residual  at  the  center  to  that  at  a  factorial  point 
location  are  1.6  {for  k=3)  and  1.4  (for  k=4) . 

3.4.2.  Three  levc.1  second  order  designs. 

The  three  level  designs  of  Box  and  Behnken  (I960)  have 
only  two  different  kinds  of  sites  as  far  as  residual  variance 
is  concerned,  center  points  and  factorial  points.  Since  a 


sssssmssssBs 
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2 

center  point  residual  always  has  variance  (n0-l)o  /nQ, 
equation  (14)  readily  provides  the  factorial  point  residual 
variance  V(e^)  as 


c”2  V(ef)  *  1  -  k(k+3)/(2nf) ,  (20) 


where  n^  is  the  number  of  factorial  points.  The  design 
for  three  factors  consisting  of  the  points 


fl 

X_2 

^3 

±1 

±1 

±1 

0 

±3L 

'  12  factorial  points 

0 

±1 

±1J 

0 

0 

0 

n0  center  points 

yields  the  following  results: 


Factors  (k) 
3 


No.  of  Center  Points  (nj 

o 

1 

2 

3 

4 


o~2  V(e_.) 
Factorial  Center 


.25  0 

.25  .5 

.25  .67 

.25  .75 


The  ratio  of  the  standard  deviations  is  1.6  for  n^  =  3, 

and  1.7  for  -  4. 

o 

The  three  level  design  for  four  factors  is 


f  ij  jm  { ,  jy  ■»  rvrmv** 
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Ik 

*2 

*3 

*4 

tl 

Tl 

0 

0 

0 

±1 

±1 

±1 

0 

0 

±1 

0 

±1 

±1 

0 

►  24  factorial  points 

±1 

0 

±1 

0 

0 

±1 

0 

+1  j 
— J 

0 

0 

0 

0 

n  center  points 

This  design,  however,  is  a  rotation  of  the  four  factor 
central  composite  design  given  above  in  Section  3.4.1  and 
hence  has  identical  variance  values.  It  is  interesting  to 
note  that,  for  all  three  level  designs,  V(ef)  is  independent 
of  nQ,  as  equation  (20)  shows. 

The  above  designs  can  be  regarded  as  incomplete  three 
level  factorial  designs.  It  is  also  possible  to  use  com- 
plete  3  designs.  These  designs  produce  k+1  nominally 
different  kinds  of  sites;  we  shall  list  the  V(e^)  for  the 
cases  k=2  and  k=3  to  illustrate  the  patterns. 


Factors 

2 


3 


Typical  Coordinate 


°'2  v,ei) 


(±1, 

±1) 

.19 

(±1, 

0) 

.44 

(  o. 

.44 

(±1, 

±1' 

±1) 

,49 

(±1, 

tl, 

0} 

.66 

(±1, 

KB 

0) 

.74 

(  o. 

0, 

0) 

.74 
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It  would  appear  from  these  example?  that  the  variation 

V 

in  the  standard  deviations  of  residuals  from  the  3  levels 
is  not  large. 

3,5.  Some  undesigned  data  examples. 

One  might  expect  considerable  variation  among  the  V(e,.) 
when  data  are  taken  without  following  symmetrical  experittant al 
designs.  Two  such  examples,  taken  from  Draper  and  Smith 
will  be  considered  next,  to  provide  some  insight  into  what 
actually  does  occur. 

Example  1.  This  example  (p.  366}  uses  data  previously  qiven 

by  Hald.  Here,  only  the  variables  x„  and  x- ,  which  are 

1*  ^ 

adequate  for  representation  purposes  as  described  in  Draper 

and  Smith  (1966,  p.  165},  are  considered.  Table  1  provides 

the  x-coordinates ,  the  residuals  from  a  fitted  first  order 

model  y  *  52.577  +  1.463x^  +  0.662x2,  the  variance  cf  the 

2 

residuals  divided  by  o  ,  and  two  columns  of  standardized 

residuals.  The  first  of  these,  e^/s  has,  as  denominator,  the 

root  mean  square  error  s=<5.7S)i//2,  taken  from  the  analysis 

of  variance  table:  the  second  is  obtained  from 

1/2 

^./{Estimated  V{e..)}  =  &j/s(e^},  where  the  denominator  is 

obtained  by  substituting  s2  =  5.79  for  o2  in  {V{e.. }  i1^2.  The 
pattern  of  points  (x^,  x2)  is  shown  in  Figure  2  with  the  scale 
chosen  so  that  the  spread  of  the  data  points  is  roughly  the 
same  in  both  and  x2  directions.  We  see  that  the  data 
points  are  fairly  well  spread  but  the  point  (21,47}  is 
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isolated.  Nevertheless  this  causes  no  difficulty  with  this 
particular  set  of  data  and  it  is  clear  from  Table  1  that  the 
e../s(e..)  and  e^/s  plots  will  not  be  sufficiently  different 
to  require  use  of  the  former  rather  than  the  latter.  It  is 
clear  however  that  the  relative  scaling  of  the  normalized 
residual  for  point  10  by  a  factor  of  1.4  compared  to  that  for 
point  5  could  change  ones  interpretation  of  a  residual  plot 
if  large  residuals  had  been  obtained  at  these  points. 
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TABLE  1.  Hald  data  example. 


j 

X1 

x2 

a”2  V(ej) 

S3 

®  j/S 

e.j/s(e..> 

1 

7 

26 

.75 

-1.6 

-.66 

-0.77 

2 

1 

29 

.73 

1.0 

.42 

0.48 

3 

11 

56 

.88 

-1.5 

-.62 

-0.66 

4 

11 

31 

.76 

-1.7 

-.71 

-0.81 

5 

7 

52 

.92 

-1.4 

-.58 

-0.61 

6 

11 

55 

.88 

4 ,0 

1.66 

1.77 

7 

3 

71 

.64 

-1.3 

-.54 

-0.68 

8 

*» 

X 

31 

.76 

-2.1 

-.87 

-1.00 

9 

2 

54 

.82 

1.8 

.75 

0.83 

10 

21 

47 

.45 

1.4 

.58 

0.87 

11 

1 

40 

.8? 

3.3 

1.37 

1.52 

12 

11 

66 

.80 

0.9 

.37 

0.42 

13 

10 

68 

.79 

-2.9 

-1.20 

-1.36 

!  UfeMMMMtetaMmM  IIMMItIMMIMM  t  ItCfHCH  »*M||  I  KM  It  I 


-19- 


v 


3,6.  Conclusions. 

In  making  residual  plots  to  check  on  the  validity  of 
model  assumptions  it  appears  that,  in  many  situations,  li  .tie 
is  lost  by  failing  to  take  into  account  the  differences  in 
the  variances  of  the  residuals.  However,  it  is  a  potential 
problem  and,  since  most  large  regression  programs  already 
provide  the  estimated  Vfy^) ,  it  is  a  simple  matter  to  add 
the  calculation  of  the  estimated  v{e^)  **  well  and  we  recommend 
this  as  a  routine  procedure.  The  normalization  of  the 
residuals  by  their  estimated  standard  deviations  can  easily 
be  performed  as  an  additional  option. 
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Figure  la.  Typical  band  of  95%  intervals  for  E(y|x)  from 
five  equally  spaced  observations. 
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Figure  lb.  "Balloon"  pattern  of  standard  deviations  of 
residuals  from  a  first  order  model,  for  five 
observations . 
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